<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Highscore - The Boost C++ Libraries - String Handling</title>
<link rel="stylesheet" href="css/highscore.css" type="text/css">
<link rev="made" href="mailto:boris@highscore.de">
<link rel="home" href="frontpage.html" title="The Boost C++ Libraries">
<link rel="up" href="frontpage.html" title="The Boost C++ Libraries">
<link rel="prev" href="eventhandling.html" title="Chapter 4: Event Handling">
<link rel="next" href="multithreading.html" title="Chapter 6: Multithreading">
<link rel="chapter" href="introduction.html" title="Chapter 1: Introduction">
<link rel="chapter" href="smartpointers.html" title="Chapter 2: Smart Pointers">
<link rel="chapter" href="functionobjects.html" title="Chapter 3: Function Objects">
<link rel="chapter" href="eventhandling.html" title="Chapter 4: Event Handling">
<link rel="chapter" href="stringhandling.html" title="Chapter 5: String Handling">
<link rel="chapter" href="multithreading.html" title="Chapter 6: Multithreading">
<link rel="chapter" href="asio.html" title="Chapter 7: Asynchronous Input and Output">
<link rel="chapter" href="interprocesscommunication.html" title="Chapter 8: Interprocess Communication">
<link rel="chapter" href="filesystem.html" title="Chapter 9: Filesystem">
<link rel="chapter" href="datetime.html" title="Chapter 10: Date and Time">
<link rel="chapter" href="serialization.html" title="Chapter 11: Serialization">
<link rel="chapter" href="parser.html" title="Chapter 12: Parser">
<link rel="chapter" href="containers.html" title="Chapter 13: Containers">
<link rel="chapter" href="datastructures.html" title="Chapter 14: Data Structures">
<link rel="chapter" href="errorhandling.html" title="Chapter 15: Error Handling">
<link rel="chapter" href="castoperators.html" title="Chapter 16: Cast Operators">
<link rel="section" href="stringhandling.html#stringhandling_general" title="5.1 General">
<link rel="section" href="stringhandling.html#stringhandling_locale" title="5.2 Locales">
<link rel="section" href="stringhandling.html#stringhandling_stringalgorithms" title="5.3 Boost.StringAlgorithms">
<link rel="section" href="stringhandling.html#stringhandling_regex" title="5.4 Boost.Regex">
<link rel="section" href="stringhandling.html#stringhandling_tokenizer" title="5.5 Boost.Tokenizer">
<link rel="section" href="stringhandling.html#stringhandling_format" title="5.6 Boost.Format">
<link rel="section" href="stringhandling.html#stringhandling_exercises" title="5.7 Exercises">
<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.highscore.de" r (nz 1 vz 1 lz 1 oz 1 cz 1) gen true for "http://highscore.de" r (nz 1 vz 1 lz 1 oz 1 cz 1))'>
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<link href="http://www.highscore.de/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon">
<script type="text/javascript" src="js/jquery-1.3.2.min.js"></script><script type="text/javascript" src="js/jquery.event.drag-1.5.min.js"></script><script type="text/javascript" src="js/highscore.js"></script>
</head>
<body>
<div lang="en" class="docbook chapter" title="Chapter 5: String Handling">
<p class="title">The Boost C++ Libraries</p>
<script type="text/javascript">
          var titlepage = "Front page";
        
      var titles = new Array(titlepage,
      
        "Chapter 1: Introduction",
      
        "Chapter 2: Smart Pointers",
      
        "Chapter 3: Function Objects",
      
        "Chapter 4: Event Handling",
      
        "Chapter 5: String Handling",
      
        "Chapter 6: Multithreading",
      
        "Chapter 7: Asynchronous Input and Output",
      
        "Chapter 8: Interprocess Communication",
      
        "Chapter 9: Filesystem",
      
        "Chapter 10: Date and Time",
      
        "Chapter 11: Serialization",
      
        "Chapter 12: Parser",
      
        "Chapter 13: Containers",
      
        "Chapter 14: Data Structures",
      
        "Chapter 15: Error Handling",
      
        "Chapter 16: Cast Operators",
      
      "");

      
          var titlehtml = "frontpage.html";
        
      var filenames = new Array(titlehtml,
      
        "introduction.html",
      
        "smartpointers.html",
      
        "functionobjects.html",
      
        "eventhandling.html",
      
        "stringhandling.html",
      
        "multithreading.html",
      
        "asio.html",
      
        "interprocesscommunication.html",
      
        "filesystem.html",
      
        "datetime.html",
      
        "serialization.html",
      
        "parser.html",
      
        "containers.html",
      
        "datastructures.html",
      
        "errorhandling.html",
      
        "castoperators.html",
      
      "");

      
      document.open();
      document.write('<form action="" class="toc">');
      document.write('<select size="1" onchange="location.href=options[selectedIndex].value">');
      for (var i = 0; i < titles.length && i < filenames.length; ++i) {
        if (titles[i] != "" && filenames[i] != "") {
          document.write('<option');
          document.write(' value="' + filenames[i] + '"');
          var expr = new RegExp('[/\]' + filenames[i] + '$');
          if (expr.test(location.href)) {
            document.write(' selected="selected"');
          }
          document.write('>' + titles[i] + '<\/option>');
        }
      }
      document.write('<\/select>');
      document.write('<\/form>');
      document.close();
      
    </script><noscript><p class="toc"><a href="toc.html">Table of Contents</a></p></noscript>
<hr class="hrhead">
<h1 class="title">
<a name="stringhandling"></a><small>Chapter 5:</small> String Handling</h1>
<hr>
<div class="toc">
<h3>Table of Contents</h3>
<ul>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_general">5.1 General</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_locale">5.2 Locales</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_stringalgorithms">5.3 Boost.StringAlgorithms</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_regex">5.4 Boost.Regex</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_tokenizer">5.5 Boost.Tokenizer</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_format">5.6 Boost.Format</a></span></li>
<li><span class="sect1"><a href="stringhandling.html#stringhandling_exercises">5.7 Exercises</a></span></li>
</ul>
</div>
<p class="license"><a href="http://creativecommons.org/licenses/by-nc-nd/3.0/" rel="license" target="_top"><img src="img/88x31_cc_logo.gif" alt="" width="88" height="31"></a> This book is licensed under a <a href="http://creativecommons.org/licenses/by-nc-nd/3.0/" rel="license" target="_top">Creative Commons License</a>.</p>
<hr>
<h2 class="title">
<a name="stringhandling_general"></a>5.1 General</h2>
<div class="sect1"><p>Strings in the C++ standard are handled by the <code class="classname">std::string</code> class which offers many functions for manipulating them. Among these are functions searching a string for a specific character or functions returning a substring. Even though <code class="classname">std::string</code> provides more than 100 functions, which makes it one of the more bloated classes of the C++ standard, many developers still miss additional functionality throughout their daily routine. For example, while Java and .NET provide functions to convert a string to uppercase, there is no equivalent available in <code class="classname">std::string</code>. The Boost C++ Libraries presented in this chapter try to close this gap.</p></div>
<hr>
<h2 class="title">
<a name="stringhandling_locale"></a>5.2 Locales</h2>
<div class="sect1">
<p>Before the Boost C++ Libraries are introduced though, one should at least take a brief look at locales. Many functions outlined in this chapter will expect a locale as an additional parameter.</p>
<p>Locales are used in the C++ standard to encapsulate cultural conventions such as the currency symbol, date and time formats, the symbol used to separate the integer portion of a number from the fractional one (radix character) as well as the symbol used for grouping numbers with more than three digits (thousands separator).</p>
<p>In terms of string handling, the locale is relevant for describing the order and the individual letters used in the particular culture. For instance, whether an alphabet contains mutated vowels and what place they take in the alphabet depends on the culture.</p>
<p>If a function is called that converts a given string to uppercase, the individual steps taken depend on the particular locale. In the German language, it is obvious that the letter 'ä' is converted to 'Ä'; however, this does not necessarily hold true for other cultures as well.</p>
<p>When working with <code class="classname">std::string</code>, the usage of locales can be neglected since none of the functions is dependent on a particular culture. In order to work with the Boost C++ Libraries in this chapter though, this knowledge is mandatory.</p>
<p>The C++ standard defines a class named <code class="classname">std::locale</code> in <code class="filename">locale</code>. Every C++ program automatically has one instance of this class - the global locale which cannot be directly accessed. Instead, a separate object of <code class="classname">std::locale</code> must be created via the default constructor that will be initialized with the same properties as the global locale.</p>
<pre class="programlisting">#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale loc; 
  std::cout &lt;&lt; loc.name() &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.2.1/main.cpp">Download source code</a></li></ul>
<p>The above program will output <code class="computeroutput">C</code> on the standard output stream which is the name of the classic locale. This locale contains descriptions used by default in programs developed with the C language.</p>
<p>This also happens to be the default global locale for every C++ application. It contains descriptions used by the American culture. For example, the dollar sign  is used as the currency symbol, the radix character is a period, and displaying a date causes the month to be written in English.</p>
<p>The global locale can be changed using the static function <code class="methodname">global()</code> of the <code class="classname">std::locale</code> class.</p>
<pre class="programlisting">#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::locale loc; 
  std::cout &lt;&lt; loc.name() &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.2.2/main.cpp">Download source code</a></li></ul>
<p>The static <code class="methodname">global()</code> function expects a new object of type <code class="classname">std::locale</code> as its sole parameter. Using a different constructor of the class, expecting a character string of type <code class="type">const char*</code>, a locale object for a particular culture can be created. However, names of locales are not standardized except for the C locale which is named "C" correspondingly. It therefore depends on the individual C++ standard library which names are actually accepted. In case of Visual Studio 2008, the definitions for the German culture can be selected using the language string "German" as outlined in the <a class="link" href="http://msdn.microsoft.com/en-us/library/39cwe7zf.aspx">documentation of language strings</a>.</p>
<p>The program will output <code class="computeroutput">German_Germany.1252</code> if executed. Specifying "German" as the language string selects the definitions for the German primary language and sublanguage as well as the character map 1252.</p>
<p>In case the sublanguage should be set to a different location of the German culture such as the Swiss, a different language string can be used.</p>
<pre class="programlisting">#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German_Switzerland")); 
  std::locale loc; 
  std::cout &lt;&lt; loc.name() &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.2.3/main.cpp">Download source code</a></li></ul>
<p>Now, the program will output <code class="computeroutput">German_Switzerland.1252</code> instead.</p>
<p>After getting an understanding about locales in general and how the global one can be changed, the following example shows how locales affect string handling.</p>
<pre class="programlisting">#include &lt;locale&gt; 
#include &lt;iostream&gt; 
#include &lt;cstring&gt; 

int main() 
{ 
  std::cout &lt;&lt; std::strcoll("ä", "z") &lt;&lt; std::endl; 
  std::locale::global(std::locale("German")); 
  std::cout &lt;&lt; std::strcoll("ä", "z") &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.2.4/main.cpp">Download source code</a></li></ul>
<p>The example uses the <code class="function">std::strcoll()</code> function defined in <code class="filename">cstring</code> to compare whether the first string is lexicographically less than the second one. In other words, which of the two strings would be found first in a dictionary.</p>
<p>If executed, the result is both <code class="computeroutput">1</code> and <code class="computeroutput">-1</code>. Even though the function is called with the same input parameters, the results are different. The reason is quite simple - while calling <code class="function">std::strcoll()</code> the first time, the global C locale is used. However, when called the second time, the global locale has been changed to incorporate definitions for the German culture instead. The order of the two characters 'ä' and 'z' is different for these locales as indicated by the output.</p>
<p>Numerous C functions as well as C++ streams access locales. Albeit functions of the <code class="classname">std::string</code> class work independently from locales, many of the functions outlined in the following paragraphs do not. Hence, locales are met again several times throughout this chapter.</p>
</div>
<hr>
<h2 class="title">
<a name="stringhandling_stringalgorithms"></a>5.3 Boost.StringAlgorithms</h2>
<div class="sect1">
<p>The Boost C++ library <a class="link" href="http://www.boost.org/doc/libs/1_36_0/doc/html/string_algo.html">Boost.StringAlgorithms</a> provides many stand-alone functions for string manipulation. Strings can be of type <code class="classname">std::string</code>, <code class="classname">std::wstring</code> or any different instance of the template class <code class="classname">std::basic_string</code>.</p>
<p>The functions are categorized within different header files. For example, functions converting from uppercase to lowercase are defined in <code class="filename">boost/algorithm/string/case_conv.hpp</code>. Since Boost.StringAlgorithms consists of more than 20 different categories and as many header files, <code class="filename">boost/algorithm/string.hpp</code> acts as the common header including all other header files for convenience. All of the following examples will use this combined header.</p>
<p>As mentioned in the previous paragraph, many functions of the Boost.StringAlgorithms library expect an object of type <code class="classname">std::locale</code> as an additional parameter. However, this parameter is optional - if not provided, the default global locale is used.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 
#include &lt;clocale&gt; 

int main() 
{ 
  std::setlocale(LC_ALL, "German"); 
  std::string s = "Boris Schäling"; 
  std::cout &lt;&lt; boost::algorithm::to_upper_copy(s) &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::to_upper_copy(s, std::locale("German")) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.1/main.cpp">Download source code</a></li></ul>
<p>The <code class="function">boost::algorithm::to_upper_copy()</code> function is used to convert a string to uppercase. Naturally, there also exists a function doing the opposite: <code class="function">boost::algorithm::to_lower_copy()</code> converts a string to lowercase. Both functions return the converted string as result. If the passed string itself should be converted, the functions <code class="function">boost::algorithm::to_upper()</code> or <code class="function">boost::algorithm::to_lower()</code> can be used instead.</p>
<p>The above example converts the string "Boris Schäling" to uppercase using <code class="function">boost::algorithm::to_upper_copy()</code>. The first call uses the default global locale while the second call explicitly states the locale for the German culture.</p>
<p>Using the latter certainly will result in a correctly converted string since the corresponding uppercase character 'Ä' exists for the lowercase 'ä'. For the C locale instead, 'ä' is an unknown character and thus is not converted. To yield correct results, either pass the correct locale explicitly or modify the global locale before calling <code class="function">boost::algorithm::to_upper_copy()</code>.</p>
<p>Note, that the program uses <code class="function">std::setlocale()</code> - defined in <code class="filename">clocale</code> - to set the locale for any C function. Internally, <var>std::cout</var> uses C functions to display information on the screen. By setting the correct locale, mutated vowels such as 'ä' and 'Ä' are displayed correctly.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  std::cout &lt;&lt; boost::algorithm::to_upper_copy(s) &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::to_upper_copy(s, std::locale("German")) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.2/main.cpp">Download source code</a></li></ul>
<p>The above program sets the German culture for the global locale which causes the first call to <code class="function">boost::algorithm::to_upper_copy()</code> to use the corresponding definitions for converting 'ä' to 'Ä'.</p>
<p>Please note that the <code class="function">std::setlocale()</code> is not called in this example. By setting the global locale using the <code class="function">std::locale::global()</code> function, the C locale is automatically set as well. In practice, C++ programs almost always set the global locale using <code class="function">std::locale::global()</code> rather than using <code class="function">std::setlocale()</code> as seen in the previous example.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  std::cout &lt;&lt; boost::algorithm::erase_first_copy(s, "i") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::erase_nth_copy(s, "i", 0) &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::erase_last_copy(s, "i") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::erase_all_copy(s, "i") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::erase_head_copy(s, 5) &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::erase_tail_copy(s, 8) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.3/main.cpp">Download source code</a></li></ul>
<p>Boost.StringAlgorithms provides several functions to delete individual characters from a string. How and where the deletion should occur can be explicitly specified. For example, a particular character can be removed from the complete string by using <code class="function">boost::algorithm::erase_all_copy()</code>. If only the first occurrence of the character should be removed, <code class="function">boost::algorithm::erase_first_copy()</code> ought to be used instead. To shorten the string by a specific number of characters on either end, the functions <code class="function">boost::algorithm::erase_head_copy()</code> and <code class="function">boost::algorithm::erase_tail_copy()</code> can be used accordingly.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::iterator_range&lt;std::string::iterator&gt; r = boost::algorithm::find_first(s, "Boris"); 
  std::cout &lt;&lt; r &lt;&lt; std::endl; 
  r = boost::algorithm::find_first(s, "xyz"); 
  std::cout &lt;&lt; r &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.4/main.cpp">Download source code</a></li></ul>
<p>Different functions such as <code class="function">boost::algorithm::find_first()</code>, <code class="function">boost::algorithm::find_last()</code>, <code class="function">boost::algorithm::find_nth()</code>, <code class="function">boost::algorithm::find_head()</code> and <code class="function">boost::algorithm::find_tail()</code> are available to find strings within strings.</p>
<p>All of these functions have in common that they return a pair of iterators of type <code class="classname">boost::iterator_range</code>. This class originates from the  Boost C++ Library <a class="link" href="http://www.boost.org/libs/range/">Boost.Range</a> which defines a range concept based on the iterator concept. Since the <code class="code">&lt;&lt;</code> operator is overloaded for <code class="classname">boost::iterator_range</code>, the result of the individual search algorithm can be directly written to the standard output stream. The above program prints <code class="computeroutput">Boris</code> for the first result and an empty string for the second one.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 
#include &lt;vector&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::vector&lt;std::string&gt; v; 
  v.push_back("Boris"); 
  v.push_back("Schäling"); 
  std::cout &lt;&lt; boost::algorithm::join(v, " ") &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.5/main.cpp">Download source code</a></li></ul>
<p>A container of strings is passed as the first parameter to the <code class="function">boost::algorithm::join()</code> function which concatenates them separated by the second parameter. The example will output <code class="computeroutput">Boris Schäling</code> accordingly.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  std::cout &lt;&lt; boost::algorithm::replace_first_copy(s, "B", "D") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::replace_nth_copy(s, "B", 0, "D") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::replace_last_copy(s, "B", "D") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::replace_all_copy(s, "B", "D") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::replace_head_copy(s, 5, "Doris") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::replace_tail_copy(s, 8, "Becker") &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.6/main.cpp">Download source code</a></li></ul>
<p>Just like functions for searching strings or removing characters from a string, Boost.StringAlgorithms also provides functions for replacing a substring within a string. Among these functions are <code class="function">boost::algorithm::replace_first_copy()</code>, <code class="function">boost::algorithm::replace_nth_copy()</code>, <code class="function">boost::algorithm::replace_last_copy()</code>, <code class="function">boost::algorithm::replace_all_copy()</code>, <code class="function">boost::algorithm::replace_head_copy()</code> and <code class="function">boost::algorithm::replace_tail_copy()</code>. They can be applied the same way as the functions used for searching and removing except that they expect an additional parameter - the replacement string.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "\t Boris Schäling \t"; 
  std::cout &lt;&lt; "." &lt;&lt; boost::algorithm::trim_left_copy(s) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_right_copy(s) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_copy(s) &lt;&lt; "." &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.7/main.cpp">Download source code</a></li></ul>
<p>In order to automatically remove spaces on either end of a string, <code class="function">boost::algorithm::trim_left_copy()</code>, <code class="function">boost::algorithm::trim_right_copy()</code> and <code class="function">boost::algorithm::trim_copy()</code> can be used. Which character counts as a space is dependent on the given global locale.</p>
<p>Boost.StringAlgorithms allows to provide a predicate as an additional parameter for different functions that determines to which characters of the string the function is applied to. The predicated versions for trimming a string are named <code class="function">boost::algorithm::trim_left_copy_if()</code>, <code class="function">boost::algorithm::trim_right_copy_if()</code> and <code class="function">boost::algorithm::trim_copy_if()</code> accordingly.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "--Boris Schäling--"; 
  std::cout &lt;&lt; "." &lt;&lt; boost::algorithm::trim_left_copy_if(s, boost::algorithm::is_any_of("-")) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_right_copy_if(s, boost::algorithm::is_any_of("-")) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_copy_if(s, boost::algorithm::is_any_of("-")) &lt;&lt; "." &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.8/main.cpp">Download source code</a></li></ul>
<p>The program in the above example accesses another function named <code class="function">boost::algorithm::is_any_of()</code> which is a helper function for creating a predicate verifying whether or not the character - passed as the parameter - exists in a given string. Using <code class="function">boost::algorithm::is_any_of()</code>, the character for trimming a string can be specified as has been done in the example which uses the hyphen.</p>
<p>Boost.StringAlgorithms already provides numerous helper functions returning commonly used predicates.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "123456789Boris Schäling123456789"; 
  std::cout &lt;&lt; "." &lt;&lt; boost::algorithm::trim_left_copy_if(s, boost::algorithm::is_digit()) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_right_copy_if(s, boost::algorithm::is_digit()) &lt;&lt; "." &lt;&lt; std::endl; 
  std::cout &lt;&lt; "." &lt;&lt;boost::algorithm::trim_copy_if(s, boost::algorithm::is_digit()) &lt;&lt; "." &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.9/main.cpp">Download source code</a></li></ul>
<p>The predicate returned by <code class="function">boost::algorithm::is_digit()</code> indicates a numeric character by returning the boolean value <code class="literal">true</code>. Helper functions are also provided to check whether or not a character is uppercase or lowercase: <code class="function">boost::algorithm::is_upper()</code> and <code class="function">boost::algorithm::is_lower()</code> respectively. All of these functions use the global locale by default unless otherwise specified by passing a different locale as a parameter.</p>
<p>Besides the predicates that verify individual characters of a string, Boost.StringAlgorithms also offers functions that work with strings instead.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  std::cout &lt;&lt; boost::algorithm::starts_with(s, "Boris") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::ends_with(s, "Schäling") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::contains(s, "is") &lt;&lt; std::endl; 
  std::cout &lt;&lt; boost::algorithm::lexicographical_compare(s, "Boris") &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.10/main.cpp">Download source code</a></li></ul>
<p>The functions <code class="function">boost::algorithm::starts_with()</code>, <code class="function">boost::algorithm::ends_with()</code>, <code class="function">boost::algorithm::contains()</code> and <code class="function">boost::algorithm::lexicographical_compare()</code> all compare two individual strings.</p>
<p>The following shows a function that allows to split a string into smaller parts.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 
#include &lt;vector&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  std::vector&lt;std::string&gt; v; 
  boost::algorithm::split(v, s, boost::algorithm::is_space()); 
  std::cout &lt;&lt; v.size() &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.11/main.cpp">Download source code</a></li></ul>
<p>Using <code class="function">boost::algorithm::split()</code>, a given string can be split into a container based on a certain delimiter. The function requires a predicate as its third parameter indicating for each character whether the string should be split at the given position. The example uses the helper function <code class="function">boost::algorithm::is_space()</code> to create a predicate that will split the string at every space character.</p>
<p>Many of the functions introduced in this paragraph also exist in a version that ignores the case of the string. They typically have the same name except for a leading 'i'. For example, the equivalent to <code class="function">boost::algorithm::erase_all_copy()</code> is <code class="function">boost::algorithm::ierase_all_copy()</code>.</p>
<p>Finally, it should be noted that many functions of Boost.StringAlgorithms also support regular expressions. The following program uses the <code class="function">boost::algorithm::find_regex()</code> function to search for a regular expression.</p>
<pre class="programlisting">#include &lt;boost/algorithm/string.hpp&gt; 
#include &lt;boost/algorithm/string/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::iterator_range&lt;std::string::iterator&gt; r = boost::algorithm::find_regex(s, boost::regex("\\w\\s\\w")); 
  std::cout &lt;&lt; r &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.3.12/main.cpp">Download source code</a></li></ul>
<p>In order to use the regular expression, the program accesses a class named <code class="classname">boost::regex</code> which is defined inside the Boost C++ Library Boost.Regex and is presented in the following paragraph.</p>
</div>
<hr>
<h2 class="title">
<a name="stringhandling_regex"></a>5.4 Boost.Regex</h2>
<div class="sect1">
<p>The Boost C++ Library <a class="link" href="http://www.boost.org/libs/regex/">Boost.Regex</a> allows the usage of regular expressions in C++. Regular expressions is a powerful feature of many languages that alleviates searching for a particular string pattern. While nowadays C++ still needs to resort to a Boost C++ Library, support for regular expressions will become part of the C++ standard library in the future: Boost.Regex is expected to be included in the next revision of the C++ standard.</p>
<p>The two most important classes in Boost.Regex are <code class="classname">boost::regex</code> and <code class="classname">boost::smatch</code>, both defined in <code class="filename">boost/regex.hpp</code>. While the former is used to define a regular expression, the latter will save the search results.</p>
<p>Boost.Regex provides three different functions to search for regular expressions which are introduced below.</p>
<pre class="programlisting">#include &lt;boost/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::regex expr("\\w+\\s\\w+"); 
  std::cout &lt;&lt; boost::regex_match(s, expr) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.4.1/main.cpp">Download source code</a></li></ul>
<p><code class="function">boost::regex_match()</code> is used to compare a string with a regular expression. It will return <code class="literal">true</code> only if the expression matches the complete string.</p>
<p>To search a string for a regular expression, <code class="function">boost::regex_search()</code> is available.</p>
<pre class="programlisting">#include &lt;boost/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::regex expr("(\\w+)\\s(\\w+)"); 
  boost::smatch what; 
  if (boost::regex_search(s, what, expr)) 
  { 
    std::cout &lt;&lt; what[0] &lt;&lt; std::endl; 
    std::cout &lt;&lt; what[1] &lt;&lt; " " &lt;&lt; what[2] &lt;&lt; std::endl; 
  } 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.4.2/main.cpp">Download source code</a></li></ul>
<p><code class="function">boost::regex_search()</code> expects a reference to an object of type <code class="classname">boost::smatch</code> as an additional parameter that is used to store the results. <code class="function">boost::regex_search()</code> only searches for groupings thus, the example actually returns two results based on the two groupings found in the regular expression.</p>
<p>The result storage class <code class="classname">boost::smatch</code> is actually a container holding elements of type <code class="classname">boost::sub_match</code> which can be accessed using an interface similar to the one of <code class="classname">std::vector</code>. For example, elements can be accessed via the <code class="methodname">operator[]()</code> operator.</p>
<p>The class <code class="classname">boost::sub_match</code> on the other hand saves iterators to the specific positions inside a string corresponding to the grouping of a regular expression. Since it is derived from <code class="classname">std::pair</code>, the individual iterators referencing a particular substring can be accessed using <var>first</var> and <var>second</var>. In order to write a substring to the standard output stream, these iterators do not necessarily need to be accessed though as seen in the above example. Using the overloaded <code class="code">&lt;&lt;</code> operator, the substring can be directly written instead.</p>
<p>Please note that since results are stored using iterators, <code class="classname">boost::sub_match</code> does not copy them. This certainly implies that they are accessible only as long as the corresponding string - referenced by the iterators - exists.</p>
<p>Furthermore, please note that the first element of the container <code class="classname">boost::smatch</code> stores iterators referencing the string that matches the complete regular expression. The first substring that matches the first grouping is accessible at index 1.</p>
<p>The third function offered by Boost.Regex is <code class="function">boost::regex_replace()</code>.</p>
<pre class="programlisting">#include &lt;boost/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = " Boris Schäling "; 
  boost::regex expr("\\s"); 
  std::string fmt("_"); 
  std::cout &lt;&lt; boost::regex_replace(s, expr, fmt) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.4.3/main.cpp">Download source code</a></li></ul>
<p>Besides the string to search as well as the regular expression, <code class="function">boost::regex_replace()</code> requires a format that defines how substrings, matching individual groupings of the regular expression, are replaced. In case the regular expression does not contain any grouping, corresponding substrings are replaced one-to-one using the given format. Thus, the above program will output <code class="computeroutput">_Boris_Schäling_</code> as the result.</p>
<p><code class="function">boost::regex_replace()</code> always searches through the complete string for the regular expression. Hence, the program actually replaced all three spaces with underscores.</p>
<pre class="programlisting">#include &lt;boost/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::regex expr("(\\w+)\\s(\\w+)"); 
  std::string fmt("\\2 \\1"); 
  std::cout &lt;&lt; boost::regex_replace(s, expr, fmt) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.4.4/main.cpp">Download source code</a></li></ul>
<p>The format can access substrings returned by groupings of the regular expression. The example uses this technique to swap the first with the last name, displaying <code class="computeroutput">Schäling Boris</code> as the result.</p>
<p>Please note that there exist different standards for regular expressions and formats. Each of the three functions takes an additional parameter that allows to select a specific standard. Whether or not special characters should be interpreted in a specific format or whether the format should rather replace the complete string matching the regular expression can be specified as well.</p>
<pre class="programlisting">#include &lt;boost/regex.hpp&gt; 
#include &lt;locale&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::locale::global(std::locale("German")); 
  std::string s = "Boris Schäling"; 
  boost::regex expr("(\\w+)\\s(\\w+)"); 
  std::string fmt("\\2 \\1"); 
  std::cout &lt;&lt; boost::regex_replace(s, expr, fmt, boost::regex_constants::format_literal) &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.4.5/main.cpp">Download source code</a></li></ul>
<p>The program passes the <code class="code">boost::regex_constants::format_literal</code> flag as the fourth parameter to <code class="function">boost::regex_replace()</code> to suppress handling of special characters in the format. Since the complete string that matches the regular expression is replaced with the format, the output of the example is <code class="computeroutput">\2 \1</code>.</p>
<p>As indicated at the end of the previous paragraph, regular expressions can also be used with Boost.StringAlgorithms. The library accesses Boost.Regex to provide functions such as <code class="function">boost::algorithm::find_regex()</code>, <code class="function">boost::algorithm::replace_regex()</code>, <code class="function">boost::algorithm::erase_regex()</code> and <code class="function">boost::algorithm::split_regex()</code>. Since Boost.Regex is expected to be part of the upcoming revision of the C++ standard, it is advisable to be proficient in applying regular expressions without the usage of Boost.StringAlgorithms though.</p>
</div>
<hr>
<h2 class="title">
<a name="stringhandling_tokenizer"></a>5.5 Boost.Tokenizer</h2>
<div class="sect1">
<p>The library <a class="link" href="http://www.boost.org/libs/tokenizer/">Boost.Tokenizer</a> allows to iterate over partial expressions in a string by interpreting certain characters as separators.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; tokenizer; 
  std::string s = "Boost C++ libraries"; 
  tokenizer tok(s); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.1/main.cpp">Download source code</a></li></ul>
<p>Boost.Tokenizer defines a template class named <code class="classname">boost::tokenizer</code> in <code class="filename">boost/tokenizer.hpp</code>. It expects a class that identifies coherent expressions for its template parameter. The above example uses the class <code class="classname">boost::char_separator</code> which interprets spaces and punctuation marks as separators.</p>
<p>A tokenizer must be initialized with a string of type <code class="classname">std::string</code>. Using the <code class="methodname">begin()</code> and <code class="methodname">end()</code> methods, the tokenizer can be accessed just like a container. Partial expressions of the string used to initialize the tokenizer are available via iterators. How partial expressions are evaluated depends on the kind of class passed as the template parameter.</p>
<p>Since <code class="classname">boost::char_separator</code> interprets spaces and punctuation marks as separators by default, the example displays <code class="computeroutput">Boost</code>, <code class="computeroutput">C</code>, <code class="computeroutput">+</code>, <code class="computeroutput">+</code> and <code class="computeroutput">libraries</code>. In order to identify these characters, <code class="classname">boost::char_separator</code> utilizes both <code class="function">std::isspace()</code> and <code class="function">std::ispunct()</code>. Boost.Tokenizer distinguishes between separators that should be displayed and separators that should be suppressed: By default, spaces are suppressed while punctuation marks are displayed. Hence the two plus signs are displayed accordingly.</p>
<p>If punctuation marks should not be interpreted as separators, the <code class="classname">boost::char_separator</code> object can be initialized accordingly before being passed to the tokenizer. The following example does exactly that.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; tokenizer; 
  std::string s = "Boost C++ libraries"; 
  boost::char_separator&lt;char&gt; sep(" "); 
  tokenizer tok(s, sep); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.2/main.cpp">Download source code</a></li></ul>
<p>The constructor of <code class="classname">boost::char_separator</code> expects a total of three parameters of which only the first one must be supplied. It describes the individual separators that are suppressed. For the given example, spaces are treated as separators just like with the previous example.</p>
<p>The second parameter specifies the separators that are displayed. In case this parameter is omitted, it is empty and thus no separators are displayed at all. If the program is now executed, it displays <code class="computeroutput">Boost</code>, <code class="computeroutput">C++</code> and <code class="computeroutput">libraries</code>.</p>
<p>If a plus sign is passed for the second parameter, the example program behaves just like the first one.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; tokenizer; 
  std::string s = "Boost C++ libraries"; 
  boost::char_separator&lt;char&gt; sep(" ", "+"); 
  tokenizer tok(s, sep); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.3/main.cpp">Download source code</a></li></ul>
<p>The third parameter determines whether or not empty partial expressions are displayed. If two separators are found back-to-back, the corresponding partial expression is empty. By default, these empty expressions are not displayed. Using the third parameter, the default behavior can be manipulated.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::char_separator&lt;char&gt; &gt; tokenizer; 
  std::string s = "Boost C++ libraries"; 
  boost::char_separator&lt;char&gt; sep(" ", "+", boost::keep_empty_tokens); 
  tokenizer tok(s, sep); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.4/main.cpp">Download source code</a></li></ul>
<p>If executed, the above program displays two additional empty partial expressions. The first one is found between the two plus signs while the second one is found between the second plus sign and the following space.</p>
<p>A tokenizer can also be used with different string types.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::char_separator&lt;wchar_t&gt;, std::wstring::const_iterator, std::wstring&gt; tokenizer; 
  std::wstring s = L"Boost C++ libraries"; 
  boost::char_separator&lt;wchar_t&gt; sep(L" "); 
  tokenizer tok(s, sep); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::wcout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.5/main.cpp">Download source code</a></li></ul>
<p>This example iterates over a string of type <code class="classname">std::wstring</code> instead. In order to allow this type of string, the tokenizer must be initialized using additional template parameters. The same applies to the <code class="classname">boost::char_separator</code> class; it also must be initialized using <code class="type">wchar_t</code> for its template parameter.</p>
<p>Besides <code class="classname">boost::char_separator</code>, Boost.Tokenizer provides two additional classes to identify partial expressions.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::escaped_list_separator&lt;char&gt; &gt; tokenizer; 
  std::string s = "Boost,\"C++ libraries\""; 
  tokenizer tok(s); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.6/main.cpp">Download source code</a></li></ul>
<p><code class="classname">boost::escaped_list_separator</code> is used to read multiple values separated by a comma. This format is commonly known as CSV (comma separated values). It also considers double quotes as well as so-called escape sequences accordingly. The output of the example is therefore <code class="computeroutput">Boost</code> and <code class="computeroutput">C++ libraries</code>.</p>
<p>The second class provided is <code class="classname">boost::offset_separator</code> which must be instantiated. The corresponding object must be passed to the constructor of <code class="classname">boost::tokenizer</code> as the second parameter.</p>
<pre class="programlisting">#include &lt;boost/tokenizer.hpp&gt; 
#include &lt;string&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  typedef boost::tokenizer&lt;boost::offset_separator&gt; tokenizer; 
  std::string s = "Boost C++ libraries"; 
  int offsets[] = { 5, 5, 9 }; 
  boost::offset_separator sep(offsets, offsets + 3); 
  tokenizer tok(s, sep); 
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it) 
    std::cout &lt;&lt; *it &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.5.7/main.cpp">Download source code</a></li></ul>
<p><code class="classname">boost::offset_separator</code> specifies the locations within the string at which individual partial expressions end. The above program specifies that the first partial expression ends after 5 characters, the second ends after additional 5 characters and the third and last ends after the following 9 characters. The output will be <code class="computeroutput">Boost</code>, <code class="computeroutput"> C++ </code> and <code class="computeroutput">libraries</code>.</p>
</div>
<hr>
<h2 class="title">
<a name="stringhandling_format"></a>5.6 Boost.Format</h2>
<div class="sect1">
<p><a class="link" href="http://www.boost.org/libs/format/">Boost.Format</a> offers a replacement for the <code class="function">std::printf()</code> function defined in <code class="filename">cstdio</code>. <code class="function">std::printf()</code> originates from the C standard and allows formatted data output. However, it is neither type-safe nor expandable. In C++ applications, Boost.Format is usually the preferred choice when data should be output in a formatted way.</p>
<p>The library Boost.Format provides a class named <code class="classname">boost::format</code> which is defined in <code class="filename">boost/format.hpp</code>. Similar to <code class="function">std::printf()</code>, a string containing special characters to control formatting is passed to the constructor of <code class="classname">boost::format</code>. The actual data replacing these special characters in the output is linked via the <code class="code">%</code> operator as shown in the following example.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%1%.%2%.%3%") % 16 % 9 % 2008 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.1/main.cpp">Download source code</a></li></ul>
<p>Boost.Format uses numerics placed between two percent signs as placeholders that are later linked to the actual data using the <code class="code">%</code> operator. The above program uses the numbers 16, 9, and 2009 to form a date string in the format of <code class="computeroutput">16.9.2008</code>. In case the month should appear in front of the day, which is common in the United States, the placeholders can simply be swapped to accommodate.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%2%/%1%/%3%") % 16 % 9 % 2008 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.2/main.cpp">Download source code</a></li></ul>
<p>The program now displays <code class="computeroutput">9/16/2008</code> instead.</p>
<p>To format data using the C++ manipulators, Boost.Format offers a function named <code class="function">boost::io::group()</code>.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%1% %2% %1%") % boost::io::group(std::showpos, 99) % 100 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.3/main.cpp">Download source code</a></li></ul>
<p>The example will display <code class="computeroutput">+99 100 +99</code> as the result. Since the manipulator <code class="function">std::showpos()</code> has been linked to the number <code class="literal">99</code> via <code class="function">boost::io::group()</code>, the plus sign is automatically added whenever <code class="literal">99</code> is displayed.</p>
<p>If the plus sign should only be shown for the first output of <code class="literal">99</code>, the format placeholder needs to be customized.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%|1$+| %2% %1%") % 99 % 100 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.4/main.cpp">Download source code</a></li></ul>
<p>The placeholder %1% has been replaced with %|1$+|. Customization of a format not only adds two additional pipe signs though. The reference to the data also is placed between the pipe signs and rather uses 1$ instead of 1%. This is required in order to modify the output to <code class="computeroutput">+99 100 99</code>.</p>
<p>Please note that, even though references to data are optional in general, they must be specified either for all placeholders or none. The following example only provides references for the second and third placeholder but omits them for the first one which generates an error during execution.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  try 
  { 
    std::cout &lt;&lt; boost::format("%|+| %2% %1%") % 99 % 100 &lt;&lt; std::endl; 
  } 
  catch (boost::io::format_error &amp;ex) 
  { 
    std::cout &lt;&lt; ex.what() &lt;&lt; std::endl; 
  } 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.5/main.cpp">Download source code</a></li></ul>
<p>This program will throw an exception of type <code class="exceptionname">boost::io::format_error</code>. Strictly speaking, Boost.Format throws <code class="exceptionname">boost::io::bad_format_string</code>. Since the different exception classes are all derived from <code class="exceptionname">boost::io::format_error</code>, it is usually easier catching exceptions of this type though.</p>
<p>The following examples shows how to write the program without having references to data.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%|+| %|| %||") % 99 % 100 % 99 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.6/main.cpp">Download source code</a></li></ul>
<p>The pipe signs for the second and third placeholder can safely be omitted since they do not specify the format in this case. The resulting syntax then closely resembles the one of <code class="function">std::printf()</code>.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%+d %d %d") % 99 % 100 % 99 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.7/main.cpp">Download source code</a></li></ul>
<p>While the format may look like the one of <code class="function">std::printf()</code>, Boost.Format still provides the advantage of type safety. The usage of the letter 'd' within the format string does not indicate the output of a numeric but rather incorporates the <code class="function">std::dec()</code> manipulator on the internal stream object used by <code class="classname">boost::format</code>. This allows to specify format strings which would not make sense for <code class="function">std::printf()</code> and thus may result in a crash of the application during execution.</p>
<pre class="programlisting">#include &lt;boost/format.hpp&gt; 
#include &lt;iostream&gt; 

int main() 
{ 
  std::cout &lt;&lt; boost::format("%+s %s %s") % 99 % 100 % 99 &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/5.6.8/main.cpp">Download source code</a></li></ul>
<p>While <code class="function">std::printf()</code> uses the letter 's' only for strings of type <code class="type">const char*</code>, the above program works perfectly. Boost.Format does not expect a string necessarily but rather incorporates the appropriate manipulators to configure the operation mode of the internal stream. Even in this case though, it is still possible to add the numbers to the internal stream as shown above.</p>
</div>
<hr>
<h2 class="title">
<a name="stringhandling_exercises"></a>5.7 Exercises</h2>
<div class="sect1">
<p class="solution">
              You can buy 
              <a target="_top" href="http://en.highscore.de/shop/index.php?p=boost-solution">solutions to all exercises</a>
              in this book as a ZIP file. 
            </p>
<ol>
<li class="listitem">
<p>Create a program that extracts and displays data such as first and last name, birthday and account balance from the following XML stream: <strong class="userinput"><code>&lt;person&gt;&lt;name&gt;Karl-Heinz Huber&lt;/name&gt;&lt;dob&gt;1970-9-30&lt;/dob&gt;&lt;account&gt;2,900.64 USD&lt;/account&gt;&lt;/person&gt;</code></strong>.</p>
<p>The first name should be displayed separated from the last name. The birthday should be shown using the typical format of 'day.month.year' while the account balance should omit any decimal place. Test your application with different XML streams that contain additional spaces, a second first name, a negative number for the account balance and so forth.</p>
</li>
<li class="listitem">
<p>Create a program that formats and displays data records such as the following: <strong class="userinput"><code>Munich Hamburg 92.12 8:25 9:45</code></strong>. This record describes a flight from Munich to Hamburg that costs 92.12 Euro, departs at 8:25 AM and arrives at 9:45 AM. It should be displayed as: <code class="computeroutput">Munich    -&gt; Hamburg      92.12 EUR (08:25-09:45)</code>.</p>
<p>More detailed, the city should be 10-digit and left-aligned while the price should be 7-digit and right-aligned. After the price, the currency should be displayed. The departure and arrival times should be shown in parenthesis, without spaces and separated by a hyphen. For times prior to 10 AM/PM, a leading 0 should be added. Test your application with different data records by e.g. adding a city that contains more than 10 digits.</p>
</li>
</ol>
</div>
</div>
<hr class="hrfoot">
<p class="copyright">Copyright © 2008-2010 
        <a class="link" href="mailto:boris@highscore.de">Boris Schäling</a>
      </p>
</body>
</html>
